Nature Magazine Research Articles

The workflow of ecological scientists is currently undergoing a quiet revolution. Recent years have witnessed a strong push towards a more open sharing of research data (Hampton et al. 2015), and the vast amount of field data being generated by individual ecologists is becoming available to the wider research community at an unprecedented rate. The wide availability of data has also pushed scientists to focus more intensively on the process of data analysis. Where computer programming ability was restricted to a very small subsets of researchers just a few years ago, the new generation of ecologists are trained programmers, developing novel software for analyses and exploring new ways to share and visualize data. This development is moving the field away from click-and-calculate (Graphical User Interface, GUI) statistical packages. A new paradigm has emerged, where individual scientists download, curate and share large amounts of data and analyse it using reproducible software packages and scripts written in languages such as R, Python and Julia. This increased focus on analytical methods has led to a number of key developments in scientific sharing and publishing, one of which is the Software notes format here at Ecography that was first instigated in 2008 (Pettersson and Rahbek 2008). The purpose of Software notes was to create a platform for disseminating high-quality analytical tools for ecology, to increase scientific transparency by opening the possibility for researchers to subject their techniques to traditional rigorous peer-review, and finally to support a transition where authors would receive scientific credit for their intellectual contributions in developing the most widely used methods in ecology. In the following 8 years, this idea has set root, and today dedicated journals such as Methods in Ecology and Evolution and Environmental Modelling and Software have followed suit in publishing software and computational methods as independent intellectual contributions. These developments in ecology have exciting implications – the availability of large amounts of data and the explosion in the analytical capabilities of ecologists, together with the potential for rapid dissemination of ideas in today's internet-based scientific community, means that ecology is moving forward rapidly, with a steep growth in the number of research papers. But this development also poses some important challenges. The large amount of project-specific software being generated for analytical studies means that analytical standards are harder to establish, potentially limiting the reproducibility of much of recently published science. Also, analytical and coding errors may escape detection, with potentially highly problematic results, such as when Geoffrey Chang and colleagues had to retract three Science papers after discovering an error in a homemade data-analysis program (Miller 2006). Substantial progress in our understanding of ecology rests on trustworthy, reproducible and transparent data analysis. Reproducibility is the very hallmark of the scientific method. However, there is an increasing concern that many studies today might not be reproducible. The focus on novelty in ‘high-impact’ journals means that there is little incentive for researchers to directly replicate published studies, and this lack of replication of studies has come under increasing scrutiny (Iqbal et al. 2016, The Economist 2016). What is more worrying, efforts to systematically replicate published studies have often failed (Open Science Collaboration 2015). This is so concerning that it has lead to the founding of a new journal dedicated explicitly to the replication of published results (the Preclinical Reproducibility and Robustness channel of F1000 opened 4 February 2016). A recent comment in Nature magazine (Allison et al. 2016) reported on the widespread problem of reproducibility in the natural sciences, and gave a sobering account of the obstacles to overcoming it, which includes the pressure on researchers to publish, the lack of established pathways for dealing with non-reproducible articles, and consistent issues with the statistical treatment of data (Krumholz 2016, The Economist 2016). Ecology faces particular challenges in reproducibility because data collection is often context dependent (Ellison 2010), and because there are few established standards for storing metadata and facilitating study replication. The keys to a greater level of reproducibility in ecology are to establish analytical protocols that are robust and transparent, to faithfully document the analytical process including any failed attempts, and to ensure that the storage and acquisition of data is documented and includes the appropriate metadata. Fortunately, recent technological developments promise to increase the reproducibility of ecological analyses, by establishing documentable and standardised workflows, where the process of data acquisition, analysis and graphical output is integrated and documented throughout, and collaborative work is integrated into the software itself. Such developments have thus far received some attention, mainly among younger scientists, and largely outside of the primary literature. The special issue ‘Tools for Reproducibility in Ecology’ seeks to promote the quest for a reproducible ecological science and highlight recent developments, while presenting a collection of software notes that aim to explicitly further scientific reproducibility in ecological data analysis. mangal – a standard for sharing network data, including a web service for accessing it and an R package front-end (Poisot et al. 2016a). ENM – a tool for species distribution models with explicit workflow and structures for sharing and documentation of analytical methods (De Giovanni et al. 2016). geoknife – a tool for acquiring geographical data from large data bases (Read et al. 2016). macroeco – a Python environment for macroecological analysis, with a scripting GUI (Kitzes and Wilber 2016). sdm – an extensible tool for species distribution modelling that provides a standardized and unified structure for handling species distribution data and for modelling distributions with correlative and mechanistic approaches (Naimi and Araújo 2016). Biogeo – a tool for programmatically detecting and correcting errors in widely used species–occurrence databases (Robertson et al. 2016). helminthR – a tool for downloading data on host–parasite interactions from online sources (Dallas 2016). Included is also a guest editorial by Poisot et al. (2016b) that highlights workflows and methods for working with datasets synthesized across several sources. These tools exemplify different aspects of a data analysis workflow with the potential to improve reproducibility of ecological research (Table 1). Such a workflow involves presenting and documenting standards for data and metadata storage and communication, documenting the process of data acquisition, relegating analytical steps to online facilities with well-documented protocols, documenting analytical work on GUI platforms, establishing clear analytical workflow protocols, and emphasizing unit-testing and quality control of analytical software. In the following, we describe how each of these approaches can play a role in ensuring scientific reproducibility. Not long ago, ecologists wanting to describe the natural world collected the data themselves. Some data would eventually be published, but much would remain in notebooks or, more recently, left in computer hard drives to eventually disappear at the retirement of the researcher, leading to an inevitable decay in data availability (Vines et al. 2014). Today's online platform allows data to be used for answering questions beyond the purpose they were collected for, and thus they become a shared resource for the global research community. Consequently, there is a push towards seeing data as a scientific product in itself, and there is ongoing work to develop a system that supplies the generation of data with suitable attribution (Mooney and Newton 2012, Data Citation Synthesis Group 2014). The development towards a larger degree of data re-use and sharing has the potential to speed the pace of scientific discovery, but is not without problems, as reported by Poisot et al. (2016b) in the present special issue. The authors describe an approach to working with large-scale synthetic data sets, and discuss many of the pitfalls. One powerful tool to deal with such pitfalls is to agree on reproducible and standardised sampling methods across systems and localities (Nogués-Bravo et al. 2011); but even in the absence of standardized sampling a significant improvement can be gained by agreeing on standards for saving and sharing data. Such a standard is described for network analyses in the software note describing mangal (Poisot et al. 2016a). The standard is explicitly formulated to make data acquisition, and as importantly, data deposition, as simple and straightforward as possible, while at the same time encouraging the deposition of as much useful metadata as possible. The mangal format is aimed at efficient parsing by machines, and comes with an associated web service and R package for easy download and deposition. A similar but smaller effort is made by the helminthR package (Dallas 2016), which presents a direct programmatic interface to the Natural History Museum of London's database of host–parasite interactions of helminth worms. It is worth noting that the creation and adoption of data standards can be a long and arduous process (Edwards et al. 2011). Given that publications remain the primary currency of a scientific career, the painstaking curation of data, including preparation of detailed metadata and establishment of standards, can be viewed as an unrewarded burden, especially for early career scientists. This perception might be gradually changing as journals begin to require data deposition with publication (Bloom 2014, Sandhu 2014) and allow data to become a stand-alone publication in journals such as Nature's journal Scientific Data. Tools such an mangal (Poisot et al. 2016a) facilitate the usage of archives and the contribution of data in a standard, reducing the burden of proper data curation. The simultaneous increase in incentives from journal editors and the creation of software tools that facilitate standards go a long way towards increasing reproducibility. In addition to allowing for data re-use and synthetic data sets, well-developed standards play a crucial role in allowing for replicability of published studies: Any researcher should be able to reproduce all presented results in a network study using mangal by downloading the data directly through the package and following the analytical steps described in the methods section of the paper. The same approach to data accessibility applies to other types of data. Not all of the data used by ecologists can be characterized as observational ecological data: researchers in ecology use data on climate, geology, topography and other abiotic factors shaping the environment. Ecological research is increasingly reliant on data products from remote sensing, thus drawing on what can be rightfully called big data (Hampton et al. 2013), in which the acquisition, curation, and preprocessing of data products are an integral part of ecological analyses. While these data products are an important component of macroecological workflows, they often are only available at spatial scales much larger than a researcher might want (e.g. the Oregon PRISM project data) and require post-download preprocessing. However this post-download preprocessing can inhibit reproducibility on two fronts: First it may be ‘ad-hoc’ and not well documented, and second it may require a computational power that is not available for most users. The geoknife R package (Read et al. 2016) offers a way to ensure reproducibility in these data acquisition and preprocessing steps, by delegating the analytical steps to online data providers, which generally implement well-documented and transparent procedures. geoknife offers a protocol to derive summary data (such as the monthly standard deviation of a high-resolution data set on temperatures) within an area exactly defined by the study area. The protocol makes the process of data acquisition easy for the ecologist, reduces the computational demands on local computer systems, reduces error rate in the data preprocessing step, and allows for easy reproducibility by allowing researchers to report a few simple lines of code that generate the input data in models. A more encompassing approach to ensuring reproducible workflows is made possible by dedicated workflow systems like Kepler (Altintas et al 2004) or the open-source project Taverna (Wolstencroft et al. 2013). Such systems present graphical platforms for calling web services, accessing online data, and running established analytical steps on the data. The approach makes for very clear and highly reproducible science, and encourages the use of established protocols for analysis whenever they are available and feasible. Taverna was originally developed for molecular biology, but its use is not restricted to this field. In this issue, De Giovanni et al. (2016) provide Taverna components, scriptable in R, for environmental niche modelling (ENM), also known as bioclimatic envelope modelling or species distribution modelling (SDM)( Peterson et al. 2011). Whether the standardized protocols that are made feasible by such workflow tools will ever dominate the analytical toolbox of individual researchers is an open question. However, there is no doubt that dedicated workflow tools offer a very powerful platform for collaboration among larger groups or big field-based projects, a type of research organization that is in itself on the rise in ecology. A slightly less encompassing approach based on scripting may be more attractive to individual researchers. In terms of supporting reproducibility, the increasing prevalence of scripting languages such as R and Python provides substantial improvements over point-and-click software packages, in that they allow analyses to be replicated exactly by re-running a script, which represents a more stringent representation of analytical choices than is possible within the methods section of a short-format research paper. However, scripts are not always shared along with the paper, they are often poorly annotated, they require special knowledge to read and are often difficult to read even for those who have that special knowledge. Also, scripting languages limit reproducible analyses to users with programming skills. However, programming skills are not common among large groups of ecologists, especially the important sector of ecologists based outside universities. In this special issue, Kitzes and Wilber (2016) provide one innovative way of making reproducible documentation of analysis available to non-programmers. Along with macroeco, a package of macroecological tools programmed for the widespread and powerful Python language, they provide a windows-based GUI platform allowing analyses to be easily specified, and subsequently run by the underlying software. The GUI saves a small script file that exactly specifies the analyses performed and is succinct enough to include directly within the methods section of a paper. By linking the reporting of the data analysis so closely between the article text and the analytical process itself, the approach represents a powerful way of ensuring reproducibility. Taking the graphical and accessible approach even further, Naimi and Araújo (2016) provide the sdm package for species distribution models that offers a fully fledged GUI interface, where models and analytical choices can be specified in a well-known and user-friendly format. The GUI, which is based on R's shiny package, converts the specified analytical settings directly into a standardized R script that can be shared along with the paper. In addition, the GUI offers the opportunity to save analytical settings as a binary data object, which can shared among collaborators and modified, ensuring that analyses can be reproduced directly from within the GUI. The package also enhances analytical reproducibility in two other ways that were discussed earlier: It allows data preprocessing to be handled within the package itself, ensuring that it is standardized and reproducible; and it allows the application of practically the entire range of different techniques for SDM within the same framework, ensuring that differences between analytical results derive explicitly from the differences between methods, rather than ad-hoc assumptions made by different software. A final, and crucial, aspect of reproducibility is to minimize the number of errors in published data. If studies cannot be replicated, it might be a sign that the analysis was flawed or that the reported results were untrustworthy, a situation that is highly detrimental to the quality and integrity of science. The few analyses that have been performed to quantify the prevalence of such errors (Simundic and Nikolac 2009, Gilbert et al. 2012, Open Science Collaboration 2015) indicate that they are more common in submitted and published papers than often anticipated. How serious these errors are for the advancement of science is still unknown, but they are potentially a very serious issue. Relying on well-established analytical tools is one way to minimize the amount of errors, although computer programs, such as R packages, are rarely peer-reviewed and by no means exempt from errors. Journals publishing software-note formats can help reducing these errors by ensuring that published analytical software have a comprehensive test suite that ensures internal consistency and error catching within the software – e.g., the macroeco (Kitzes and Wilber 2016) package in this issue is covered by 135 internal unit tests and all are available in the package's github repository. Errors of recording are another major complication, well-known among researchers extracting ecological data from field notebooks. The problem is greatly exacerbated by the reliance on large online databases based on such data, which pervade many modern ecological analyses. The thorny issue of errors finding their way into analyses has prompted the creation of automatic tools to detect and correct errors and inconsistencies, such as the widely used Taxonomic Name Resolution Service (Boyle et al. 2013), which resolves the identities of plant species based on taxonomic synonymy, and also corrects for spelling errors. Robertson et al. (2016) present the biogeo package, which offers facilities for correcting common errors and quality issues with occurrence records found in large data bases. Not only does the software highlight potential errors, it also provides probable suggestions for the correct entries and allows the user to correct them in an easy and reproducible manner. Likewise, sdm includes R functions to correct for spelling errors while coding and parameterizing species distribution models (Naimi and Araújo 2016). The software notes in this special issue all contribute to a more reproducible ecology in which analyses rest on solid, error-checked software, without stymieing the free growth of creative analytical ideas; and where documentation and metadata support a solid foundation under today's fast-moving integrative ecological research field. The notes were chosen to highlight a breadth of topics and approaches that are required to ensure reproducibility. The sooner these considerations are integrated into our workflows and collaborations, the stronger the foundation of the ecology we build for the future.

A large lecture hall in the main building at CERN, The Canine Experimental Research Network headquarters, somewhere in Europe. The hall is packed with science reporters from many nations. An expectant buzz is heard throughout the room. At the front of the hall is a stage with a podium. TV cameras and microphones are clustered around the front of the stage, aimed at the podium. A door to the right of the stage opens, and a large chocolate Labrador retriever walks across to the podium. A small, wheat-colored spaniel-poodle mixed breed trots after him and stands off to the side. The Labrador begins (Figure (Figure11). Figure 1 Mink and Clifford proudly show off the effects of the Briggs Noson. Mink: Good morning, I'm Dr Mink, the head of one of the two experimental consortia here. I'll be making the first presentation. It's OK to ask questions during my remarks, if there's something you don't understand. (Clears throat.) As you know, the Standard Poodle Model of the Universe is perhaps the greatest intellectual achievement of the branch of science called muttaphysics. But you also know that the Standard Poodle Model has been unable to explain one of the deepest mysteries of the cosmos, namely, why a dog's nose is always cold. It doesn't matter if it's cold outside or hot outside; the nose is always cold. It has been appreciated, for many years, that this fact violates the Second Law of Thermodynamics. Clifford (suddenly singing): Heat won't pass from a cooler to a hotter; You can try it if you like, but you far better notter, 'Cause the cold in the cooler will get hotter as a ruler, 'Cause the hotter body's heat will pass to the cooler... Mink (sharply): Stop that! (To the audience) Excuse me, please. My colleague was just channeling Flanders and Swann. Where was I? Oh, yes. The great mystery of how a dog's nose can stay cold even when it's very hot outside. With the recent discovery of Bark Matter, this is clearly the fundamental problem remaining, and therefore is occupying the attention of muttaphysicists worldwide. You will recall that, in 1974, Hermann Briggs proposed a solution to the problem, by hypothesizing the existence of a particle that could carry heat away from the tip of the nose. The one you journalists sometimes call 'the dog particle'. Reporter from Nature magazine: You mean... Mink: That's right: the Briggs Noson. We've called this meeting because we have a monumental discovery to announce with respect to this particle. (An excited murmur races through the room.) You will recall that, in his initial theory, Briggs proposed that the Briggs Noson needed to have a mass somewhere between zero and a gazillion electron volts. Since then, almost 40 years of astronomically expensive high-energy muttaphysics experiments have narrowed that range down to between 1 electron volt and a gazillion electron volts. Now let me show you the new data, about which we are so excited. They come from two completely different, independent experiments. I'll present the results of the first one, and my colleague - (glances at Clifford, who has started to wander off aimlessly) Sit! Stay! - Sorry, my colleague, Dr Clifford, will present the others. You will be amazed at the agreement. (He brings up the first slide on a large screen. It shows a large black dog and smaller white rabbit facing each other in a tunnel.) These data come from the LHC, the Labrador Hare Collider. In this experiment, we smash the cold nose of a Labrador into the warm nose of a rabbit and look for evidence of a particle that would transfer heat away from the dog's nose to that of the bunny. (The slide changes to a movie, in which the dog and the rabbit repeatedly butt noses against each other.) After years of data collection at a cost of bazillions of euros, we produced the data shown on the next slide. (The movie is replaced by a slide showing a graph with an enormous amount of noise and two tiny blips a fraction above the noise.) These observations allow us to make a definitive statement about the existence of the Briggs Noson for the first time. But before we do that, my colleague will show the results of the other experiment. Clifford (stepping to the podium): I'm going to show you the results of the completely independent and totally different experiments from the HLC, the Hound Lepus Collider. The next slide shows the experimental configuration. (Slide changes to show what looks like the same black Labrador and the same white rabbit facing each other in the same tunnel.) Reporter from Science magazine: That looks like exactly the same experiment as the first one. Clifford (indignantly): Well, it's not! The LHC experiment is carried out on the third floor of CERN, but my HLC experiment is carried out on the second floor. So you see, the conditions are completely different. Reporter from Physics Yesterday: And just how much did this experiment cost? Clifford: At least two bazillion euros. But it was worth it: in the movie that's coming up on the screen now, you will see the actual historic data as they were being collected. (The screen changes to a film showing, through a transparent window in the tunnel, the dog and the rabbit butting heads. Outside the window, Clifford, Mink, and a number of other scientists are gazing with rapt attention at a huge cathode ray detector that must have cost squintillions of euros. The detector is mostly completely blank. Suddenly a blip occurs.) Clifford: There! Did you see it? (Another blip appears.) See - there's another one. (This continues for a time.) Reporter for the News of the World (suddenly): Say, aren't those blips appearing every time the chocolate Lab scratches himself? Could he have been jostling the detector accidentally? (Mink and Clifford stare at the film for a few moments.) Mink (hastily changing to the next slide): Er, possibly, but it doesn't change our revolutionary conclusion (the screen shows another set of data, as noisy as the first, with three blips a fraction above the sea of noise). Reporter for USA Today: You mean, you have proof that the Briggs Noson exists? Mink: We can't conclude that, no. Reporter for Oggi: You mean, you have proof the Briggs Noson doesn't exist? Clifford: We can't conclude that either, no. Reporter for Der Stern: Well, what exactly can you conclude? Mink: We can conclude that, if the Briggs Noson does exist, it has a mass between 2 electron volts and a gazillion electron volts. We have been able to narrow down the possible mass range considerably. Reporter from Japan Times: You at CERN wouldn't be approaching a decision on another round of funding by any chance, would you? Clifford: Possibly. Reporter from EMBO Journal (who doesn't really belong here but, like the others, wasn't about to turn down a free trip to Geneva): So this was all just a publicity stunt to get us to write stories that would convince the politicians who pay for this stuff that it's worth pouring more quadrillions of euros into a bunch of experiments that haven't found anything definitive yet and possibly never will! Sheesh - what a letdown! I mean, what if the people who sequenced the human genome had announced that they had completed the sequence but had no idea how many genes there were? Clifford: Isn't that exactly what they did announce? Mink: And as for this being all about the publicity and funding, isn't that what most Big Science projects do? They release enough periodic announcements of 'important discoveries' that aren't really all that important when you look at them carefully, and that by coincidence tend to come out more frequently when their funding is up for renewal. Look at the structural genomics projects, and the genome wide association studies, and the cancer genome program, and the - Reporter from The Australian: But that doesn't mean the muttaphysicists should keep doing it! You Briggs Noson hunters fooled us once before, you know. Mink: You came this time, didn't you? (With groans of disgust, the reporters file out of the lecture hall. Mink and Clifford are left alone on the stage.) Clifford (to Mink): Maybe we finally overdid the hype before this event this time. What if they never come back? Mink (wisely): Oh, they'll be back. They always come back. After all, they have a nose for news.

Nature Magazine Research Articles

Articles published on Nature Magazine

“A Truer Test of Woodcraft”: Bell & Howell Camera Advertising, Nature Magazine, and the Creation of the Modern “Camera Hunter”

The need for the development of discipline-specific approaches to address academic bullying.

STEM the bullying: An empirical investigation of abusive supervision in academic science.

KVANTIFIKACIJA REZULTATA U DRUŠTVENIM I HUMANISTIČKIM NAUKAMA – CITIRANOST KAO MERILO ISTORIOGRAFSKOG DOSTIGNUĆA U SLUČAJU INSTITUTA ZA SAVREMENU ISTORIJU

STEM the Bullying: An Empirical Investigation of Abusive Supervision in Academic Science

‘Business of science’ digest—November 2020

Doctoral students report high levels of anxiety, depression

Doctoral students report high levels of anxiety, depression

Conwy Lloyd Morgan, Methodology, and the Origins of Comparative Psychology.

Moonbit: Poetry from Apollo 11’s Computer Code

Simulation Technology: The Evolution of the Power System Network [History]

Stem-cell research and regenerative medicine in China

Towards a more reproducible ecology

Thought-shape fusion in young healthy females appears after vivid imagination of thin ideals

ВИКОРИСТАННЯ СУЧАСНОГО ПРОГРАМНОГО ЗАБЕЗПЕЧЕННЯ В СИСТЕМАТИЗАЦІЇ ЛІТЕРАТУРНИХ ДАНИХ ЗА ІНТРАНАЗАЛЬНИМИ ЛІКАРСЬКИМИ ЗАСОБАМИ

How Nature Magazine consistently prefers anecdote over data

Echo: The Editor’s Wisdom with the Elegance of a Magazine

North American Mammals and the American Black Bear, <i>Ursus Americanus</i>

The dog particle

Manuscript format and specification for submission to Nature Magazine

Lead the way for us

Nature Magazine Research Articles

Articles published on Nature Magazine

“A Truer Test of Woodcraft”: Bell &amp; Howell Camera Advertising, Nature Magazine, and the Creation of the Modern “Camera Hunter”

The need for the development of discipline-specific approaches to address academic bullying.

STEM the bullying: An empirical investigation of abusive supervision in academic science.

KVANTIFIKACIJA REZULTATA U DRUŠTVENIM I HUMANISTIČKIM NAUKAMA – CITIRANOST KAO MERILO ISTORIOGRAFSKOG DOSTIGNUĆA U SLUČAJU INSTITUTA ZA SAVREMENU ISTORIJU

STEM the Bullying: An Empirical Investigation of Abusive Supervision in Academic Science

‘Business of science’ digest—November 2020

Doctoral students report high levels of anxiety, depression

Doctoral students report high levels of anxiety, depression

Conwy Lloyd Morgan, Methodology, and the Origins of Comparative Psychology.

Moonbit: Poetry from Apollo 11’s Computer Code

Simulation Technology: The Evolution of the Power System Network [History]

Stem-cell research and regenerative medicine in China

Towards a more reproducible ecology

Thought-shape fusion in young healthy females appears after vivid imagination of thin ideals

ВИКОРИСТАННЯ СУЧАСНОГО ПРОГРАМНОГО ЗАБЕЗПЕЧЕННЯ В СИСТЕМАТИЗАЦІЇ ЛІТЕРАТУРНИХ ДАНИХ ЗА ІНТРАНАЗАЛЬНИМИ ЛІКАРСЬКИМИ ЗАСОБАМИ

How Nature Magazine consistently prefers anecdote over data

Echo: The Editor’s Wisdom with the Elegance of a Magazine

North American Mammals and the American Black Bear, &lt;i&gt;Ursus Americanus&lt;/i&gt;

The dog particle

Manuscript format and specification for submission to Nature Magazine

“A Truer Test of Woodcraft”: Bell & Howell Camera Advertising, Nature Magazine, and the Creation of the Modern “Camera Hunter”

North American Mammals and the American Black Bear, <i>Ursus Americanus</i>